Content Management, also known as CM, is a set of processes and technologies supporting handling of digital information. This digital information is often referred to as digital content. Currently, people managing content have very few tools to tell them, a priori, if users will be able to locate their content.
“Findability” is the term used to refer to the quality of being locatable or the ability to be found. Findability has become highly relevant with the expansion of the World Wide Web. However, findability is not limited to the web and can equally be applied to other environments. The structure, language and writing style used for content description all have a huge effect on the “findability” of content by users searching for information encapsulated in that content.
Currently, content providers who write content for Intranet and Internet web-sites find it very difficult to make the pages they write findable by potential consumers. The main reason is the difficulty to describe their content in optimal ways from the search engine perspective. This means that while the content itself may suffice, it is written and formatted in such a way that the search engine employed to find the site, e.g., in response to a query or user input, ranks it low compared to other pages which may be of lesser interest to people who search for specific content.
Currently, the main solution to this problem is either experience of the people who generate the content or, more commonly, experts in the field of Search Engine Optimization (SEO) who reformat pages to be better valued by the search engine, based on their knowledge and experience.
However, both these approaches rely on experience rather than on objective measures. SEO experts usually do not have access to the search engine ranking methodology, hence their recommendations are mostly based on experience and common assumptions. Moreover, due to the complexity of the ranking algorithms used by state of the art search engines, it is extremely difficult to predict in advance, the effect of modifications, in content and structure, on the ranking of the web-pages pages in the set of results for specific queries.
In the patent literature, US Patent Publication No. 2003/0046389A1 describes a standard SEO system, where web site improvements are based on user behaviour in the web site. While US Patent Publication No. 2003/0046389 talks about the possibility that an Human SEO will recommend how to improve the site by discussing how SEO professionals may “advise” an owner of the Web site on how to write strategic and relevant “copy,” i.e., text, for a given Web page or Web site, there is no teaching or disclosure as to how to do it automatically based on a findability analysis without human expert intervention.
Japanese patent JP2001319129A2 refers to a general system which is based on comparing the behaviour of different search engines on the same web site.
European Patent application WO/07143395A2 refers to a system which proposes improvements based on general rules (generated by an expert) that would generally benefit the performance of a web page however, not specific to any one website.
Other non-patent literature refers to the generation implementation of simple heuristics (e.g., analysis of log files, analysis of incoming links) to suggest improvements to web pages.
It would be highly desirable to provide a system and method that operates in conjunction with a search engine for automatically analyzing a web-site or web page and making recommendations for improving document (e.g., web-page) and web-site findability.